Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve batch loading of UUIDs stats #154

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

JGreenlee
Copy link
Contributor

https://dash.plotly.com/background-callbacks#using-set_props-within-a-callback

Implements a 'background' callback to improve the batching of add_user_stats. This avoids relying on a dcc.Interval with a fixed time between batches

The new callback, which has background=True, continues running continuously on the server until all of the chunks have been processed. At the end of each chunk it calls set_props, which sends an update to the UI with the latest data for 'store-loaded-uuids'

This should speed things up further and prevent any glitches caused by the fixed interval

@JGreenlee JGreenlee marked this pull request as ready for review December 13, 2024 22:16
@JGreenlee
Copy link
Contributor Author

@TeachMeTW at your convenience, could you test this locally on the stage dataset and let me know how it works?

I tested it with smaller ones and it looks promising, but I am not able to easily load large dumps

@TeachMeTW
Copy link
Contributor

TeachMeTW commented Dec 14, 2024

@JGreenlee

I've tested the application locally on the staging dataset and wanted to share my findings:

Error on Overview Page

A nonexistent object was used in an `Input` of a Dash callback. The id of this object is `tabs-datatable` and the property is `value`. The string ids in the current layout are: [url, store-trips, store-uuids, store-excluded-uuids, store-demographics, store-trajectories, page-content, date-picker, collapse-button, collapse-icon, collapse-filters, date-picker-timezone, excluded-subgroups, global-loading, _pages_location, _pages_content, _pages_store, _pages_dummy, card-users, card-active-users, card-trips, fig-sign-up-trend, fig-trips-trend]

Performance Observations

  • Staging Dataset (82 UUIDs): Lightning fast
  • OpenAccess Dataset (200+ UUIDs): Normal/the usual performance which takes ~20 seconds ad-hoc measurement

It appears that the issue might be related to how larger datasets are being handled.

The superfast staging dataset

Screen.Recording.2024-12-13.at.8.12.56.PM.mov

@TeachMeTW
Copy link
Contributor

@JGreenlee

Screen.Recording.2024-12-17.at.1.35.20.PM.mov

Tested on open_access. The UUIDs do not load in batches, they stay in a blank screen for 3-4 min then load everything at once -- akin to pre-batch behavior.

@JGreenlee
Copy link
Contributor Author

JGreenlee commented Dec 17, 2024

Tested on open_access. The UUIDs do not load in batches, they stay in a blank screen for 3-4 min then load everything at once -- akin to pre-batch behavior.

Thanks for testing. I must have messed something up. I'll check it out tomorrow

https://dash.plotly.com/background-callbacks#using-set_props-within-a-callback

Implements a 'background' callback to improve the batching of add_user_stats. This avoids relying on a dcc.Interval with a fixed time between batches

The new callback, which has background=True, continues running continuously on the server until all of the chunks have been processed. At the end of each chunk it calls set_props, which sends an update to the UI with the latest data for 'store-loaded-uuids'

Via `running=`, the loading spinner is disabled during execution of the callback and reinstated when the callback finishes or cancels.
When there are 0 loaded-uuids-stats and >0 total uuids (i.e. while we are loading the first batch of 10) a secondary loading spinner is shown in place of the table

This should speed things up further and prevent any glitches caused by the fixed interval

The Output of the callback is 'all-uuids-stats-loaded', a new store which is not used anywhere. This seems pointless, but is necessary because the callback's initialization depends on its Output. if it has no Output, it initializes on the Overview page causing errors. If the output is 'loaded-uuids-stats', it waits for the callback to finish before showing anything, defeating the purpose of batch loading. Thus, I resorted to making a new store for it
This store is already declared in app_sidebar_collapsible (which is global to the entire app)
Redeclaring it on this page, I think does nothing, but it might cause issues because dash doesn't expect 2 components with the same ID. removing it.
@JGreenlee JGreenlee force-pushed the improve-uuid-batching branch from f7b773d to 616f7da Compare December 19, 2024 15:21
@JGreenlee
Copy link
Contributor Author

@TeachMeTW I believe I have resolved the issue. Can you test again on open_access?

I wanted to test it myself on a larger dataset so I went through the trouble of loading in the latest usaid_laos_ev dump. It is not fast, but appears to be batching correctly.

Testing done:

Before the first batch is loaded
image

After first batch is loaded
image

Callback cancels when switched to different page or tab (it never got past 70, which is when I switched to the trips tab)
image

Starts over when switched back to UUIDs tab
image

@JGreenlee
Copy link
Contributor Author

I think the reason for it being fast on Stage but slow on Laos + Open Access is simply that Stage has not nearly as many trips as Laos + Open Access.
I had been comparing the # of users, which Laos + Open Access have 2-3x more than Stage. However, the most expensive task here is querying for trips. Laos + Open Access both have like 10x more trips than Stage.

@JGreenlee
Copy link
Contributor Author

JGreenlee commented Dec 19, 2024

One hiccup I noticed that is annoying for the dev workflow is that autoreload sometimes breaks. Normally, when a file is modified and the dashboard is running, it refreshes itself. But if I do this while the UUIDs tab is showing, it gets stuck and I have to restart the the docker container to see my changes.

But if I move off of the UUIDs tab and then modify a file, it reloads like normal. So I think there is something about having a background callback running that blocks the autoreload process.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: PRs for review by peers
Development

Successfully merging this pull request may close these issues.

2 participants